-
-
Notifications
You must be signed in to change notification settings - Fork 51
feat(py_venv): Replace untennable copying with symlinks #644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Refactor the new venv builder to use modular strategies - Implement copying and symlinking among other strategies - Implement some strategy combinators - Refactor the venv tool so only new .pth and symlink strategies are usable - Implement a _good_ symlink + patching strategy via combinators - Make the previous copying behavior unreachable
|
I'm having trouble running the tests on my machine; we can talk about it synchronously later. The meat of your question to me, I think, is about dyn-compatibility and how we can make the source-code simpler when branching on different types. Short answer: Virtualization is mutually exclusive with generics. Especially for functions where you only genericize because you're widening your Postel acceptance, you should get in the habit of writing your primary logic with a concretely lowered type (here The reason for this is that the ENTIRE function body of a generic function gets monomorphized for EVERY set of type arguments you use in it. If you have an inner non-generic function, it only exists once (function items are always global to the translation unit, even if they are syntactically nested inside another function; the only thing that does is affect symbol visibility in the privacy system), so the outer function with generics only monomorphizes as a couple function calls. I'll put up a child PR into this one that illustrates switching the I also have style notes on other aspects of the code but I'll save that for after the virtualization. |
Ok I'm not tired yet. See #648 |
Code review per request --------- Co-authored-by: Reid D. McKenzie <[email protected]>
As reported by customers, the naive but correct strategy of using copies in
py_venv_*
can lead to laughable disk usage. Some clients are reporting order 10min slowdowns and order 100GiB disk usage wasted copying inputs into binaries. We need a more scalable strategy such as symlinking.Thankfully we can generate symlinks from tools driven by Bazel into a TreeArtifact so long as the symlinks aren't dangling. By carefully crafting relative symlinks we're able to produce a tree of links which is valid both at and after action time. When relocating a
.runfiles
tree containing such links (for instance into a OCI later tar) these links must be dereferenced but that Just Works.While I'm at it, refactor the venv machinery to operate in terms of strategies and combinators on strategies so that it's simpler to talk about the production-grade behavior we want which is:
site-packages
trees in 1stparty code get relocated/linked into the venvbin
sibling trees in 1stparty code get relocated/patched into the venv.pth
file entriesbin
sibling trees in 3rdparty code get relocated/patched into the venvThis makes the venv builder significantly more flexible, allows for better error reporting and opens the door to more flexible error handling.
Incorporates an implementation of #606, but testing is required.
Should include an implementation of #635, but testing is required.
Changes are visible to end-users: yes
py_venv_*
now use symlinks rather than hard file copies which radically reduce disk usage while improving venv building performance.Test plan
TODO.
Remaining work
site-packages/__init__.py
file to be linkedsite-packages/__init__.py
file will not be linked